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Abstract — A new negative result for nonparametric distri- 
bution estimation of binary ergodic processes is shown. The 
problem of estimation of distribution with any degree of accuracy 
is studied. Then it is shown that for any countable class of 
estimators there is a zero-entropy binary ergodic process that is 
inconsistent with the class of estimators. Our result is different 
from other negative results for universal forecasting scheme of 
ergodic processes. 

Index Terms — ergodic process, cutting and stacking, nonpara- 
metric estimation, computable function. 

I. Introduction 

Let X\ , X2 , . . . be a binary-valued ergodic process and 
P be its distribution. In this paper we study nonparametric 
estimation of binary-valued ergodic processes with any degree 
of accuracy. Let S and CI be the set of finite binary strings 
and the set of infinite binary sequences, respectively. Let 
A(x) := {xu;|u; S CI}, where xw is the concatenation of 
x e S and uj, and write P(x) — P(A(x)). For x £ S, \x\ 
is the length of x. Let N, Z, and Q be the set of natural 
numbers, the set of integers, and the set of rational numbers, 
respectively. From ergodic theorem, there is a function r such 
that for x g S, n, k e N, 

, l»|-M+i 

P(U{A(y)\\P(x)-- V+M-i = J>l/fc, 

1=1 ' CD 
\y \ = n}) < r(n,k,x), 

Vx, fc lim r(n, k, x) = 0, 

n 

where I is the indicator function and y\ = ytiji+i • • • J/j for 
V = Vf'Vnii < j < n. r is called convergence rate. If 
r is given, we know how much sample size is necessary to 
estimate the distribution with prescribed accuracy. However 
it is known that there is no universal convergence rate for 
ergodic theorem. If r is not known, ergodic theorem does 
not help to estimate the distribution with prescribed accuracy. 
Here a natural question arise: for any binary-valued ergodic 
process, is it always possible to estimate the distribution with 
any degree of accuracy with positive probability? We show that 
this problem has a negative answer, i.e., for any countable class 
of estimators there is a zero-entropy binary ergodic process 
that is not estimated from this class of estimators with positive 
probability. In particular, since the set of computable functions 
is countable, we see that there is a zero-entropy binary ergodic 
process that is inconsistent with computable estimators. Our 
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result is not derived from other negative results for universal 
forecasting scheme of ergodic processes, see Remark Q] 
Let x C y if x is a prefix of y. f is called estimator if 

3D f C S x N x S f : D f -> Q and 

f(x,k,y) is defined, i.e., (x,k,y) £ Df 
=>Vz^ly f(x,k,z) = f(x,k,y). 

For uj £ Cl, let f(x, k, uj) := f(x, k, y) if f(x, k, y) is defined 
and y □ w. We say that / estimates P if 

P(lu j Vx, k f(x,k,ui) is defined and 

\P(x)-f(x,k,u)\<h>0. 

k 

Here lu is a sample sequence and the minimum length of y c lu 
for which f(x, k, y) is defined is a stopping time. 

In this paper, we construct an ergodic process that is not 
estimated from any given countable set of estimators: 

Theorem 1. 

\/F : countable set of estimators 

3P ergodic and zero entropy V/ G F 

P{lo I Vx, k /(x, k, uj) is defined and 

\P(x)-f(x,k,uj)\ < ~) = 0. 

We say that P is effectively estimated if there is a partial 
computable / that satisfies (f2]i and (0. Since the set of partial 
computable estimators is countable, we have 

Corollary 1. There is a zero entropy ergodic process that is 
not effectively estimated. 

If r in (Q]i is computable then it is easy to see that P is 
effectively estimated^ For example, i.i.d. processes of finite 
alphabet are effectively estimated, see Leeuw et al. (3J. 

As stated above, a difficulty of effective estimation of 
ergodic processes comes from that there is no universal 
convergence rate for ergodic theorem. In Shields pp.171 Q, 
it is shown that for any given decreasing function r, there is 
an ergodic process that satisfies 

a 

3N\fn > N P(|P(l)-^7 X4 =i/n| > 1/2) > r(n). (4) 

i=l 

In particular if r is chosen such that r decreases to asymp- 
totically slower than any computable function then r is not 

1 More precisely if r is upper semi-computable (approximated from above 
by some algorithm), P is effectively estimated. 
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computable. In V'yugin (8), a binary-valued computable sta- 
tionary process with incomputable convergence rate is shown. 

It is possible that an ergodic process is effectively estimated 
even if the convergence rate is not computable (nor upper 
semi-computable). 

Theorem 2. For any decreasing r, there is a zero entropy 
ergodic process that is effectively estimated and satisfies (0. 

Remark 1. In Cover Q, two problems about prediction of 
ergodic processes are posed. Problem 1 : Is there a universal 
scheme / such that lirrWoc \f{X^ v ) - P(X n \X^~ 1 )\ -> 
0, a.s. for all binary-valued ergodic P? Problem 2 : Is 
there a universal scheme / such that lim„^oo \f(X~ 1 ) — 
P(X \XZl o )\ -> 0, a.s. for all binary-valued ergodic P? 
Problem 2 was affirmatively solved by Ornstein 0, (9). 
Problem 1 has a negative answer as follows (Bailey, Ryabko, 
see m, (6|, ||4)): For any / there is a binary-valued ergodic 
process X\ , X%, . . . such that 

P(limsup I/CXq"" 1 ) - P{X n \X%- x )\ > 0) > 0. (5) 

n— too 

It is not difficult to see that the above result is extended to 
a countable class {/i, /2, . . .}, i.e., for any {/i, /2, . . .} there 
is an ergodic process such that (© holds for all /i,/2,- — 
However this result does not imply Theorem Q] In fact, there 
is a finite-valued ergodic process that is effectively estimated 
but satisfies Q, see below. Roughly speaking, one of the 
difference between these problems is that in Problem 1 we 
have to estimate P(X„[Xq _1 ) from Xq , however in our 
estimation scheme, sample size is a stopping time and we 
can use a sufficiently large sample X™,m > n to estimate 

p(xs). 

In Ryabko [6|, the process for © is constructed as follows: 
First consider an ergodic Markov process Y 1 ,Y 2 ,-- - on a 
countable state 0, 1, 2, . . . and Pj-j+i := 1/2, Pj := 1/2 for 
i = 0, 1, . . ., where Py is the transition probability from i to 
j. The process Xi £ {0, l,2},i e N is defined by Xi = 
if Y t = and P(X 4 = 1) = p J ,P(X l = 2) = 1 - Pj if 
Yi = j > 1. Then {Xi} is ergodic. In particular, for any 
{/i, /2, . . .}, we can choose {p\,p2, ■ ■ .} such that © holds 
for all S N. However {Xi} is effectively estimated for 
any {pi} as follows. Let 

Ij = {i | Xi = and X fe ^ for i < k < i + j}. 

From the construction, we have 

Xi — and X k ^ for i < k < i + j = j. 

Since the above event has a positive probability, Ij is an 
infinite set with probability one. Since Yi+j = j for i G /j, 
{X i+:) } ie /. are i.i.d. random variables with P(X i+ j = 1) = 
Pj, P(X i+ j — 2) = 1 — pj. Thus we can estimate pj with 
any degree of accuracy for all j. Since the process {Xi} is 
determined from {pi}, it is effectively estimated. 

II. Cutting and Stacking 

We construct ergodic processes in Theorem [T] and |2] by cut- 
ting and stacking method. The basic idea of our constructions 
are similar to that of (0). In this section, we briefly introduce 



some notions about cutting and stacking, which we need in 
the proof. 

Let X := [0, 1] and consider Lebesgue measure A on 
(X, B), where B is the Borel er-field. We construct an ergodic 
transformation T on (X,B,X). Let C := {L\, L2, . . . , L n ) 
be an ordered set of mutually disjoint intervals of equal 
length. C is called column. w(C) := A(Li), h(C) := n, and 
S(C) := UiLi are called width, height, and support of C, 
respectively. Two columns are called disjoint if their support 
are disjoint. For two disjoint columns C := (L\,L^ . . . , L n ) 
and C := (L[, L' 2 , . . . , L' m ) of the same width, let C * 
C := (L\, . . . , L n , L[, . . . , L' m ). For a given column C = 
{Li}i<i<n, two disjoint columns Cl '■= {Lj}i<i<„ and 
Cr := {L\ }i<j<„ are called partition of C if Li = LjuLf and 
L\ n L\ = 0~for 1 < i < n and w{C R ) = w(C L ) = \w{C). 
In order to specify the partition, we require that the left- 
endpoint of L\ is less than that of L\. Let C * C := 
Cl * Cn, where Cl and Cr are partition of C, see Fig. Q] 
Let C(0) := C and C(n + 1) := C(n) * C(n) for n > 0. 
We have w{C(n + 1)) = \w{C(n)) = 2~ < - n+ ^w(C) and 
h(C(n + 1)) = 2h(C(n)) = 2 n+1 h(C). 

A transformation T is defined on a column C :— 
(Li, . . . , L n ) by 1) T(Li) = L i+1 and T(a t +y) = a i+1 + y, 
where < y < w(C), and a, is the left-endpoint of Li for 
1 < i < n — 1, and 2) T is not defined on L n . Then T is 
a measure preserving transformation defined on intervals of C 
except for L n . Similarly, T _1 is defined on C except for L\. 
Note that T (and T" 1 ) defined by C * C extends T (and T -1 ) 
defined by C, respectively. 

We say that a sequence of columns C\ , C%, . . . is extending 
if S{C n ) C S'(C„ + i) and T defined by C n+i extends T 
defined by C„ for all n. Suppose that there is an extending 
sequence of columns C\, C2, . . . such that lim„ w(C n ) = and 
X(U n S(C„)) = 1. Then we see that an invertible measure- 
preserving transformation T : X —> X is uniquely defined 
except for a null set. T is ergodic as follows: Suppose that 
4 C I is a nontrivial invariant set, i.e., T(A) = A and 
< X(A) < 1. Since lim„w(C„) = and X(U n S(C n )) = 1, 
from Lebesgue density theorem, there are n and Li , Lj 6 C n 
such that 1/2 < X(A n Li)/\(Li), 1/2 < X(A C n Lj)/X{Lj). 
Then A(A n A c ) = A(T^ l (A) n A c ) > A(T^ l (A n L,) n 
A c H Lj) > 0, which is a contradiction. 

Let X° and X 1 be measurable sets of X such that 
X° U X 1 = X and I^I 1 = 0. For £ e X, let 
0(0 = ■••£(-l)£(0)£(l).-. £ {0,1} Z , where = if 
T*(£) e X° and 1 else for all i e Z. Let P := Ao^- 1 . If T is 
an invertible ergodic transformation, P is an invertible ergodic 
process on {0, 1} Z and is called (T, X° , X 1 ) process. We say 
that a column (Li, . . . , L n ) is compatible with (X ,^ 1 ) if 
VI < i < n,Li C X° or Li C X 1 , and in that case let 
s(Li) := if Li C X° and 1 else for 1 < i < n, and 
s(Li, . . . , L„) := s(ii) • • • s(L n ). 
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Fig. 1. Cutting and stacking 

III. Proof of Theorem[T] 

Let F :— /2 5 • • •} be a countable set of estimators. 
Consider the following three statements: 

VP ergodic on SI 3/ e F 

1 (6) 

P(uj\Vx,k \P(x)-f(x,k,cj)\<-)>0, 

VP ergodic on SI 3f £ F 

P{u\Vm,k |P(0 m )-/(0 m ,fc, w )|<i)>0, (7) 

VP ergodic on SI 3f £ F P(u \ f(uj) = R) > 0, (8) 
where m is the m-times concatenation of 0's, 
R := {{n,m) | P(0 m ) < 2~("+ 2 )}, 
/ := {{n, m, y) \ 3k f(0 m , k,y) + \< 2-(" +2 >}, 

F := {/V G F], 
f(x) := {(n,m) | (n,m,y) G f,y C x}, /(w) := U aCa) /(iE). 
Then we have ©^{Z^®, where follows from 

Vm,fc |P(0 m )-/(0"\fc, W )| < i^/>)=P. 

Therefore in order to show TheoremQ] it is sufficient to negate 
(O (Lemma [T])- 

Lemma 1. For any countable F, there is a zero entropy 
ergodic P such that 

V/ e F, P(u I /(w) = R) = 0. 

Proof) Let P := /2> • • •}• We construct an ergodic 
process inductively by cutting and stacking method such that 
if there are a, x, e such that a £ / e (x) at some stage then the 
process is made to falsify / e , i.e., a ^ R. 

Let X° := [0,1/2) and X 1 := [1/2,1]. For n > 1, let 
A n := (2~(" +1 ), 2~ n ]. We construct inductively an extending 
sequence of columns Cq,Ci,..., which are compatible with 
(X^X 1 ), lim n w(C n ) = 0, and U n S{C n ) = U t€J A t UX\ 
where J is defined simultaneously with columns. 

Stage 0: Let C := X 1 , G := 0, and k := 1. 

Stage n: Suppose that G n -i is defined and Cq, . . . , C„-i 
are extending and compatible with (X^^X 1 ). Let C n -i := 
(Li, . . . , Li lri l ) and suppose that w(C n -i) = 2~ fe "~ 1 for 
fc„_i € N. Let 

Pri := {((e, i),m) |1 < e < n, 1 < i < h n -i, 

((e,i),m) £ f e {s(Li ■ ■■L hn _J)}, 



G n := {(e, i) | 3m ((e, i),m) G F n } n (G„_!) c , 

where (-, ■) : N x N -> N is a bijection and (G„_i) c is the 
complement of G n -i- 

If G„ = then set k n := fe n _i + 1 and C n := C„_i(l). 
If G n ^ then let 

m(e, i) := min{m | ((e, i), m) £ P„} for (e, i) £ G n , and 

fcjj := 

max{fc„_i + 1, 

min{t G N | V(e,i) G G„, 2 t - <e <* ) - 1 > 2m(e,i)}}- 

(10) 

Since w(C n -i) = 2- fc —! and w{A {eA ) = 2-^ +1 \ we 
have 

w(C„_i(fc n - fc„_i)) = w(A^ eti )(k n - (e,i) - 1)) = 2~ fc " . 

(11) 

Define 

C n := C n _i(fc n -fc n _i)*A ni (fc„-m-l)*- • -*A„ t (fc„-n t -l), 

(12) 

where G„ = {ni < n 2 < • • • < 

By induction, we have constructed an extending sequence of 
columns Co,Ci, . . ., which are compatible with X 1 ) and 

w(C n ) = 2~ fc ". Let J := U„G„ then U„S*(C„) = U^jA, U 

Let Hi := U„5(C„). Let T : Sli ->■ Q x be an invertible 
measure preserving transformation defined by U n C n and P be 
the (T, X° , X 1 ) process, then P is ergodic. 

Let 

R := {(n,m) | P(0 m ) < 2-("+ 2 )}. 
Suppose that there is an e such that 

P(W | / e (w) = P) > 0. (13) 

Since fc„_i < fc„ for all n, we have lim„ /i(C„) = oo. Since 
Vn, s(C„„i) C s(C„), we see that s(Ci), si&z), ■ ■ ■ defines a 
unique sequence a :— ol\qli--- £ f2,Vi,aj G {0,1} in the 
limit, i.e., Vn, s(C n ) C a. Since S7i = U n S(C n ), we have 

C G Sli ^ 3n,i,l < i < h n , £ G L h C n = (L 1} . . . , L hn ), 
3i £(0) ■ ■ ■ £(h n -i) = Oi--- a hn , 

(14) 

where h n = h(C n ). From ©, ( fT3l >, and (fT4l i. there are i, n G 
N, 1 < i < h n -x such that 

f e (cti ■ ■■a hn _ 1 ) C i? and (e,i) G G„. (15) 

From ( [Tot , we have 

*(A (e , 4) (fc„ - (e,<> - 1)) = o^"-^- 1 □ a«(e.0. 

Let (Li, . . := A( e>i )(k n -(e,i)-l). If £ £ U 1 < j < h / 2 L j 

then □ m(e ' l) . Since A(5(A (e ^)) = 2-^ +1 \ we 

have 

Then ((e, i), m(e, «)) G f e {pna i+1 ■ ■■) and ((e, i), m(e, «)) ^ 
P, which contradicts to (fTBI l, see Fig. |2] Thus we have 

Ve,P(^ | / e («) = R) =0. 
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Next we show that the entropy is zero. From ( ITTb . ( [121 ). 
and ( fT4b . for 1 < i < j < h n , we have P(c\!j • - • ay) > 
X(w(C n )) = 2- k " and h n > 2 k ^~ k ^h n -i > 2 k »- k °h Q = 
2 fc »- x . Since 1/2 < A(S(C n )), we have ± < A(U 1 < 4 < /lii/2 i i ) 
and 

Vn p ( - to fa^."^/ 2 ) < K2 - kn+2) > 1/4 . (16) 

Suppose that the entropy of P is positive. Since lim„ k n = oo 
and lim„ h n — oo, from Shannon-McMillan-Breiman theo- 
rem, 

\fe3N\fn > N P(- log 2 P(u>i • • • u n )/n > k n 2- k "+ 2 ) > 1-e, 
which contradicts to ( [T6l i. and the entropy of P is zero. □ 

s(C„_i) = on - •«/(, / e (ai •••a/ l ) 3 ((e,i),m) 

s(C„_i) s(C n _i) □0 m 

s(C„) = ai ■ • • ah • • • ai -"-a/T ■ • ■ ■ • • . 

Fig. 2. Example of construction. For simplicity, suppose that s(C„_i) = 
ai---a h , G n = {{e, i}}, and ((e,i),m) £ / e (a, ■ ■ • Then by 
stacking a long column of A/ e ^\, / e (ct, • ■ ■ o^) fails to guess i?. We 
choose a, 6 such that w(C n -i(a)) = w(A( e ^(b)), 2 b > 2m and let 
C n '■= C n -i(a) * A/ e j\(6). Then by considering the trajectories starting 
from the first half levels of A {e 4) (6), we have P(0 m ) > A(S(A (e 4) ))/2 = 
2-«e,i>+2) - 

IV. Proof of Theorem[2] 

A. Construction of a process for (@ 

Here we summarize the construction of the process for ©, 
which we use in Theorem [2] (Actually independent cutting 
and stacking method is used in Q, however we need not it 
here.) 

We construct an extending sequence of columns C n ,n — 
0, 1, 2, . . . from k = 1 < k\ < k 2 < ■ ■ ■ E N by induction. 
Let X := [0,1], X° = [0,1/2], X 1 = (1/2,1],C := X\ 
and A n := (2-(" +1 \ 2~ n ] for n= 1,2,.... 

Stage n: Suppose that C n -i is defined and w(C n -i) = 
2" fe «- 1 . Since w(C„_i(fc„-fc„_i)) = w(A n (k n - (n + 1))) = 
2~ kn , define 

C n := C n -i(k n - kn-i) * A n (k n - (n + 1)). (17) 

Then 

w(C n ) = 2- k % 

S{C n ) = U^A.UX 1 and X(S(C n )) = 1 - 2^ n+1 \ (18) 

h(C n ) = X(S(C n ))/w(C n ) = 2 fc "(l - 2~(" +1 )). 

Since lim„ w(C n ) — and A(U„C n ) = 1, U„C„ defines an in- 
vertible ergodic process T. Let P be the (T, A" , A" 1 ) process. 
Let A„(fc„-(n + l))) = (L 1; . . . , L h ), h = 2 k ^ n+1 \ Since 
A n C X°, we have s(L x , . . . , L fe ) = h and if £ G U^Lj 
then £(0)f(l) •••£(>' - 1) = 0' 1 ' for ft' = ft/ 2 - Since 
A(U^i^i) = 2-(™+ 2 ), we have 

h' 

2 -(n+2) < p( ' 1 ') < P(|P(0) -^/ Xl =o//i'| > 1/2). 

i=l 



Thus by choosing {ki}, we can construct an ergodic process 
with arbitrary slow convergence rate. 

B. Proof 

We show Theorem |2] for the ergodic process defined in 
Section II\AAl 

In the following we write x n as the n-times concatenation 
of x e S, e.g., (01) 2 = 0101. From (O, we have 

S (C„) = ( S (C„_ 1 )) 2fc "^"- 1 2fc ^ ( " +1) . (19) 

For example, if fen. = L &i — 2,k 2 — 3 then 

s(C Q ) = 1, s(d) = 110, s(C 2 ) = 1101100. 

From dT7] >, dT8l . and ( fT9l . we see that 

P(01 2fel_1 0) = w(Ci) = 2~ k \ 
P(01"0) = if n y£ 2 kl - x and n ^ 0, 

™ (20) 
P(10 /(n) l) =w(C n ) = 2- fc ", /(n) = ^2^-( l+1 ), 

i=i 

P(10 m l) = if Vn m ^ f(n) and m ^ 0. 
Let 

B := C and B n := u£ n "* ,, ~ 1 ~ 1) ' ,(CB - l) i <I (21) 

for C n = {L\, . . . , Lh n ), n > 1. From ( fT7b and Lemma |2] 
below, we have 

A(n? =0 A) = A(P )n? =1 (i - 2-*-*'-^), 

see Fig. [3] Assume that Vi fcj — fcj_i > i. Since 
2-( fc -- fe -i) < 1, we have 

A(n£ Pi) > 0. 

Let £ G ng Pi and 0(0' := f (0)f (1) -60, Then 

(*) the first time that the pattern 10 n l appears in 4>(€)' 
is less than that of 10 m l if n < m, P(10™1) > 0, and 
P(10 m l) > 0. 

We have (*) as follows: Let £ G n^ P,. Let B{x) := {n \ 
10™1 appear in a;}, where we write 10°1 = 11. Since fci — 
k > 1, from ([T7J and (EB, we have 11 C 0(0'- Let x := 11 
then (*) trivially holds for xq and B(xq) — {0}. From ( TTTb and 
(TSTI) , there are cc n _i G 5 and k > 1 such that x n -iyl C 0(0' 
for zy := s(C„_i) fe 2 ^" " +1 . Suppose that (*) holds for 
and P(x„_i) = {/(i) | < i < n - 1}, where /(0) = 0. 
Since B(s(C n -i)) Q B(x n — i) and the first bit of s(C n _i) is 
1, we have P(x„_i) = P(z) for a; n _i C z C x n -iy and 
P(x„_iyl) = P(x n _i)U{/(n)}. Let x„ := x n -iyl then (*) 
holds for x n and B(x n ) = {/(«) | < i < n}. By induction, 
(*) holds for 0(C)', see Fig. |3] 

Let A' := {(i,fcO | i G N}. Since 10 n l and 01 m l appear 
in 0(0' iff P(10 n l) > and P(01 m l) > 0. From (gO) and 
(*), we can compute K from <p(£)' if £ G r\Pi. Thus there 
is a partial computable g such that (i) g(x) is defined then 
g(x) = g(z) for x C z, (ii) g( w ) := U a;CtJ 3(a;) and (iii) 

P{w | ff (w) = A} > A(n£ Pi) > 0. (22) 
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C n = cC^-* ctT"- 1 * A(k n - (n + 1)), 

Li L2 Lz ■ ■ ■ L7 Lg Lc, Lio Ln L\2 £13 -^14 
1 1 ••• 1 1 1 1 0. 

Fig. 3. C*_i,l < i < 2 fe "- fc «-i is a 2 fe "- fc »-i partition of C„_i. 
Note that s(C n _i) ends with 10-^™ — 1 ' and does not contain the pattern 
i /(»- '1. Since s(C n — 1) starts with 1, the trajectories starting from B n 
contain the pattern 10^' n_1 'l. For example, let fco = l,fei = 2, = 
4 then s(Cn) = M(Ci) = 110, s(C 2 ) = 11011011011000. £ is the 
union of such that s(Lj) = 1. Bo f~l B\ = L\ U L4 U L7 U Lio, and 
Bo D Si D S2 = It U L4 U L7. The trajectories starting from Bo f~l Si 
always contain the pattern 1 1 and those from Bo H Bi n B2 always contain 
patterns 11 and 101. 
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Let 

P n (x) := \(U{L t \31<i<j<h, x = s(L t . . . Lj)}), 

for C n = (Li,...,L h ). We have P n (x) < P n +i{x) and 
Pn{ x ) > Pn{ x ty + Pn( x ^) f° r a ll n an d x - Since (i) 
P n (x) is computable from s(C„) and w(C n ) = 2~ kn and (ii) 
s(C n ) is computable from fco, ... , fc„, we have that P n (x) is 
computable from fco, ... , fc„. Since A(U„iS(C„)) = 1, we have 
lim n P n {x) = P{x). Since P is a probability, we can compute 
P(x) with any given precision from K. Thus P is effectively 
estimated from (/>(£)', C £ Hj-Bj. 

Finally we show that the entropy is zero. Since w{C n ) = 
2- fc " and h(C n ) = 2 fc "(l - 2-(™ +1 )), we have 

lim -logP( S (C„)) =0 

n h(C n ) 

Since A(U„5(C n )) = 1, from a similar argument for the 
previous theorem, we see that the entropy is zero. □ 

Lemma 2. Let C := (Li, . . . ,Lh) and (L[, . . . , L' 2kfi ) := 
C(fc). Then for < n < 2 k - 1 and J C {1, . . . , h}, 

U jeJ Lj n l)t + nh+i L 'i ^ UieJ'L'i, J' = {j + nh\j G J}, 
K^erL'j) = 2- k X(U jeJ L j ). 

Proof) Since C(fc) is a concatenation of 2 k columns of the 
same width partition of C, the lemma follows. □ 
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